Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add non-atomic multi-msg transactions #15038

Closed

Conversation

nicolaslara
Copy link
Contributor

@nicolaslara nicolaslara commented Feb 15, 2023

Description

Closes: #13911

This adds support for non-atomic multi-msg transactions.

It introduces a NonAtomic boolean field in the transaction proto that defaults to false. If this is not set, everything should behave as before. If set, the transaction will succeed if at least one message succeeds.

If all messages fail, the whole transaction will fail the same way as atomic transactions do.

State changes form failed messages should be reverted.

If the tx succeeds, but some messages failed. The resulting TxMsgData should contain an array of message responses that matches the order of the submitted messages. Successes will contain the MsgResponse and failures will contain an empty response created specifically for this: /cosmos.tx.v1beta1.MsgFailureResponse


Author Checklist

All items are required. Please add a note to the item if the item is not applicable and
please add links to any relevant follow up issues.

I have...

  • included the correct type prefix in the PR title
  • added ! to the type prefix if API or client breaking change
  • targeted the correct branch (see PR Targeting)
  • provided a link to the relevant issue or specification
  • followed the guidelines for building modules
  • included the necessary unit and integration tests
  • added a changelog entry to CHANGELOG.md
  • included comments for documenting Go code
  • updated the relevant documentation or specification
  • reviewed "Files changed" and left comments if necessary
  • confirmed all CI checks have passed

Reviewers Checklist

All items are required. Please add a note if the item is not applicable and please add
your handle next to the items reviewed if you only reviewed selected items.

I have...

  • confirmed the correct type prefix in the PR title
  • confirmed ! in the type prefix if API or client breaking change
  • confirmed all author checklist items have been addressed
  • reviewed state machine logic
  • reviewed API design and naming
  • reviewed documentation is accurate
  • reviewed tests and test coverage
  • manually tested (if applicable)

@nicolaslara nicolaslara changed the title initial setup with proto changes and tests Add non-atomic multi-msg transactions Feb 15, 2023
@tac0turtle
Copy link
Member

hey let us know when youd like some feedback on this pr

@nicolaslara nicolaslara changed the title Add non-atomic multi-msg transactions feat: add non-atomic multi-msg transactions Mar 8, 2023
@nicolaslara
Copy link
Contributor Author

@tac0turtle I don't need a full review rn, but some feedback on the general approach would be great! Basically, do we want to:

  • Add field to tx proto
  • Modify RunMsgs to do atomic vs non-atomic based on that field
  • Default to non-atomic

If so, then my next steps are:

  • Cache the context for each message
  • Check that results and errors are properly emitted and that we can map them to the original message

Copy link
Contributor

@alexanderbez alexanderbez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The general idea of this makes sense to me. Is this something you intend to complete @nicolaslara? Really cool to see someone working on this 🙏

@nicolaslara
Copy link
Contributor Author

yeah, I want to complete it soon, just didn't want to spend too much time on it before validating the approach.

@nicolaslara nicolaslara marked this pull request as ready for review March 16, 2023 16:27
@nicolaslara nicolaslara requested a review from a team as a code owner March 16, 2023 16:27
@nicolaslara
Copy link
Contributor Author

@tac0turtle @alexanderbez This is ready for feedback now.

There are two test failures that I'm not sure how to get around:

  • Concensus warn. It's unclear to me what this is checking.
  • Tx contains new fields (yes, that's on purpose)

Other than that I think this covers everything we need.

baseapp/baseapp.go Outdated Show resolved Hide resolved
types/errors/errors.go Outdated Show resolved Hide resolved
@amaury1093 amaury1093 mentioned this pull request Mar 22, 2023
39 tasks
Copy link
Contributor

@amaury1093 amaury1093 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK.

proto/cosmos/tx/v1beta1/tx.proto Outdated Show resolved Hide resolved
baseapp/baseapp.go Outdated Show resolved Hide resolved
baseapp/baseapp.go Outdated Show resolved Hide resolved
@@ -87,6 +87,7 @@ type StdTx struct {
Signatures []StdSignature `json:"signatures" yaml:"signatures"`
Memo string `json:"memo" yaml:"memo"`
TimeoutHeight uint64 `json:"timeout_height" yaml:"timeout_height"`
NonAtomic bool `json:"non_atomic" yaml:"non_atomic"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to sign a NonAtomic tx with AMINO_JSON, we need to modify the ledger app to show this boolean field in a new screen.

I propose to never allow NonAtomic txs with SIGN_MODE_AMINO_JSON. Does this sound reasonable?

If so, could you add it in this PR, maybe in amino's sign mode handler?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds reasonable. I'll look into it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Started adding the signing mode validation in osmosis-labs@09eb3a4. It's a bit messy because the tx has to be made into a concrete type, which normally doesn't happen in baseapp.

I'm not sure what a reliable way to test this would be though. Do you know of a good way to generate amino signed txs in tests that I can use for this?

//
// If set to true, the transaction will be executed as long as at least one
// of the messages succeeds.
bool non_atomic = 4;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once this is merged, we need to add a field in the Textual enveloppe value renderer to show this field too. I added an item in #11970 to track this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's on this same PR now 👌

Copy link
Member

@facundomedica facundomedica left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good, thank you for adding the stuff needed for signmode textual. I'll test it today and approve 👌

Comment on lines 84 to 91
if err := k.event.EventManager(ctx).EmitKV(
ctx,
"update_consensus_params",
event.Attribute{Key: "authority", Value: req.Authority},
event.Attribute{Key: "parameters", Value: consensusParams.String()}); err != nil {
return nil, err
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the reason for this?

//
// If set to true, the transaction will be executed as long as at least one
// of the messages succeeds.
bool non_atomic = 4;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's on this same PR now 👌

Copy link
Member

@facundomedica facundomedica left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. btw sign mode textual doesn't work with this as is but I'll have a fix soon, not caused by this pr tho.

@github-actions github-actions bot added the C:Rosetta Issues and PR related to Rosetta label May 12, 2023
if err != nil {
return nil, errorsmod.Wrapf(err, "failed to execute message; message index: %d", i)
wrappedErr := errorsmod.Wrapf(err, "failed to execute message; message index: %d", i)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, this is unsafe. The block header includes the following information as determinstic:

  • Code
  • Data (marshalled msg responses)
  • GasWanted
  • GasUsed

This combination creates a potential halting scenario. This code will include the full error string from the msg exeuction into the msg responses, thus causing the error string to know be included in the block. If the error string is undeterminstic (happens with json marshal errors) or it is changed (a backport changes error string formatting), a chain would halt. We have this same issue with writing error strings into acknowledgements for IBC, which is common occurrence with interchain accounts

To increase the readability of these error messages, I'd like to ask cometbft to include the codespace in the block header as well (I will open an issue), but I also think there should potentially be a more radical change to include the base error message as well. The original design was to use the Code as a condensed error string with the idea that that clients would lookup the code to get a more descriptive error string, but this is infeasible with IBC as a client would need to look up a different chain's error code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense. A simple solution would be to extract error code via errorsmod.ABCIInfo

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the logic which is unsafe here seems to be the same as it was before, id opt to not return any sort of error to comet which gets included in consensus, (wiping the included data on errors that gets included in the header). I think this is fundamentally unsafe and its surprising no one has halted because of this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The difference between before and after is that now the error is being included in the msgResponses which will eventually be marshalled and placed into the Data field (which is a determinsitic field).

Existing flow:

Thus the non-determinstic parts of an error returned on a failed msg execution are removed for the determinstic fields of the deliver tx response. In the new flow, the full error is put into the msg response below on line 843, which eventually ends up in the Data field which could lead to an app hash mismatch

One tricky part is that you likely want the errors to end up in the deliver tx response as well since the log is non-determinstic at the moment and can provide useful information

Copy link
Contributor Author

@nicolaslara nicolaslara May 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think having the error that gets included in the error responses be just the ABCIInfo end emit the context as events or logs would solve this.

Also, hot take: consensus failures on error string differences should be irrelevant as long as all validators return an error, and there should be a way to fix this post-hoc at a performance penalty (though that's a much larger issue)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made the change so that errors are converted to ABCI errors in 5a0ffba

@github-actions
Copy link
Contributor

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the Stale label Jun 29, 2023
@tac0turtle
Copy link
Member

@testinginprod could we get a review on this pr

@github-actions github-actions bot removed the Stale label Jun 30, 2023
Copy link
Contributor

@alexanderbez alexanderbez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

few minor nits, otherwise ACK 🎉

baseapp/baseapp.go Outdated Show resolved Hide resolved
baseapp/baseapp.go Outdated Show resolved Hide resolved
baseapp/baseapp.go Outdated Show resolved Hide resolved
Copy link
Contributor

@08d2 08d2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙅

Transactions are a concept that assert precisely one property: atomic execution. If a transaction doesn't provide atomic execution, then it is not a transaction, full stop.

You can define something else that provides partial execution, that's fine! But you can't redefine transaction to sometimes be atomic, and sometimes not. No go.

/cc @ebuchman @zmanian

Copy link
Collaborator

@odeke-em odeke-em left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @nicolaslara, please take a look at these few comments.

}

for testNum, tc := range testCases {
t.Run(tc.name, func(t *testing.T) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make a fesh copy of tc and testNum that ensure that the values will be afresh even if the test is parallelized, so:

   testNum, tc := testNum, tc
   t.Run(tc.name, func(t *testing.T) {
       ...

// replace the second message with a Counter2
tx = newTxCounter(t, suite.txConfig, NonAtomic, int64(testNum+1), 3)

builder := suite.txConfig.NewTxBuilder()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to declare this variable close too where it is used.

Comment on lines +123 to +124
// If set to true, the transaction will be executed as long as at least one
// of the messages succeeds.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kind ping @nicolaslara!

Comment on lines +123 to +124
// If set to true, the transaction will be executed as long as at least one
// of the messages succeeds.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kind ping @nicolaslara!

@github-actions
Copy link
Contributor

github-actions bot commented Oct 5, 2023

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the Stale label Oct 5, 2023
@github-actions github-actions bot closed this Oct 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

feat: nonatomic multimsg